An introduction to R

author: Jorge Cimentada and Basilio Moreno date: 6th of July 2019 class: section font-family: ‘Helvetica’ width: 1800 height: 900

Subsetting in R

{r, echo = F} set.seed(2131)

Alright, so far we have seen vectors, matrices and data frames.

x <- sample(1:10)
x

We have 10 random numbers.

Their positions are:

{r, echo = F} setNames(x, 1:10)

Subsetting in R

If x is: {r, echo = F} x

what is the result of: ```{r, eval = F} x[c(1, 3, 8)] #Watch out for square brackets.

x[c(-1, -5)]

x[seq(1, 8, 2)]

x[NA]

x[]


Write it down without running it!

Subsetting in R
========================================================

Do these subsetting rules apply the same for all types of vectors?
```{r}
char <- letters[1:10]
lgl <- c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE)
gender <- factor(sample(c("female", "male"), 10, replace = T))

What about these ones?

{r, eval = F} char[c(1, 1, 1)] lgl[c(TRUE, 5, 1)] gender[c(1:3, TRUE)]

Super test: {r, eval = F} super_vector <- c(char, gender, lgl) super_vector[c(1, 11, 27)]

Subsetting in R

Subsetting rules are the same for all types of vectors.

Exceptions are:

Let’s go through each one…

Subsetting in R

incremental: true If you remember correctly, matrices are a vector with rows rows and columns.

x_matrix <- matrix(1:10, 5, 2) # 5 rows and 2 columns
x_matrix

Building on the previous examples, what wouldl be the result of this? {r, eval = F} x_matrix[c(1, 4, 6)]

To confuse you even more, what do you think would be the result of this? {r, eval = F} x_matrix[2:3, ]

Subsetting in R

A matrix can be thought of as two things:


* Or a numeric vector with rows and columns
```{r, echo = F}
x_matrix

Subsetting in R

Now that you know.. what are the results of: ```{r, eval = F} x_matrix[1:5, 2]

x_matrix[, 2]

x_matrix[1, 1]

x_matrix[1:10, 2]

x_matrix[, 1:2]


Subsetting in R
========================================================

Now, data frame are very similar to matrices.

```{r, echo = F}
set.seed(21)
our_df <- data.frame(letters = letters[1:10], age = sample(25:50, 10),
                     lgl = sample(c(TRUE, FALSE), 10, replace = T))
our_df

Subsetting in R

The same way matrices are subsetted!

```{r, eval = F} # First 3 rows for all columns our_df[1:3, ]

Only the first and 8th row for first two columns

our_df[c(1, 8), 1:2]

The 5th column three times for the third column

our_df[c(5, 5, 5), 3]


What? Why is the last one a vector?

Subsetting in R
========================================================

So far we saw how to subset the same way we subset matrices.

* Data frames are lists, remember?
* They also have similar subsetting rules to lists.

```{r, eval = F}
# We lose the data frame dimensions using this method.
our_df[["age"]]

# We get a data frame with this one.
our_df["age"] 

# We don't get a data frame here.
our_df$age

Subsetting in R

Following the ‘list’ subsetting rules for data frames:

The result should be: {r, echo = F} our_df$age[c(3, 4, 9)]

Subsetting in R

Well, now that we’re at it… How does it work for lists?

our_list <- list(data = our_df, x_matrix, gnd = gender)

Explanation

Subsetting in R

{r, eval = F} ourlist

Subsetting in R

{r, eval = F} ourlist[1]

Subsetting in R

{r, eval = F} ourlist[[1]]

Subsetting in R

{r, eval = F} ourlist[[1]][[1]]

========================================================

How do we create variables inside data frames or matrices?

Subsetting in R

incremental: true

What does this return? ```{r, eval = F} our_df[[“our_variable”]]

our_df[“our_variable”]

our_df$our_variable


* Nothing!
* We're subsetting a variable that doesn't exist

* What is missing to create this variable?

Subsetting in R
========================================================
incremental: true

Three ways of creating a variable:
```{r, eval = F}
our_df[["our_variable"]] <- 1:10

our_df["our_variable"] <- 11:20

our_df$our_variable <- seq(1, 20, 2)

There’s one other way of doing it… Think hard about [] and the , to divide rows and columns

our_df[, "our_variable"] <- "this repeats until end"

Subsetting in R

incremental: true

Add two variables to the our_df data frame from any of the options above.

Call them whatever you want.

our_df$lgl_two <- our_df$age >= 35
our_df$add <- our_df$age + our_df$lgl

Subsetting in R

When whe subset we almost always don’t subset like we’ve been doing.

You have all the tools to achieve this, can you tell me how to do this?

Subsetting in R

Ok, we only want people with ages below 40 years old.

{r, eval = F} age < 40

Everything set!

Subsetting in R

our_df$age < 40

Subsetting in R

incremental: true

our_df[c(2, 4, 7, 8, 10), ]
our_df[our_df$age < 40, ]

Subsetting in R

We can subset pretty much anything with logical vectors.

{r, eval = F} gender[gender == "female"] lgl[lgl == TRUE]

Always think about the details!

gender == "female" # is a logical statement

We could’ve written:

gender[c(FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE)]

But that’s too long.

Functions in R

Let’s move on to functions.

What are functions?

All at the same time!

Functions in R

For example, take the sd function (standard deviation).

class(x)
class(sd)

Functions in R

x
sd

Functions in R

{r, eval = F} sd(x)

returns the standard deviation of a variable

When you have questions about a function type ?function_name

Functions in R

incremental: true

x <- rnorm(100)
y <- x + rnorm(100, mean = 1, sd = 1)
cor(x, y, method = "spearman")

======================================================== # To be continued….